Virtual interconnections for reconfigurable logic systems

ABSTRACT

A compilation technique overcomes device pin limitations using virtual interconnections. Virtual interconnections overcome pin limitations by intelligently multiplexing each physical wire among multiple logical wires and pipelining these connections at the maximum clocking frequency. Virtual interconnections increase usable bandwidth and relax the absolute limits imposed on gate utilization in logic emulation systems employing Field Programmable Gate Arrays (FPGAs). A &#34;softwire&#34; compiler utilizes static routing and relies on minimal hardware support. The technique can be applied to any topology and FPGA device.

RELATED APPLICATIONS

This application is the U.S. National phase of International ApplicationNo. PCT/US94/03620, filed Apr. 1, 1994 which claimed priority to U.S.Ser. No. 08/042,151, filed Apr. 2, 1993, now U.S. Pat. No. 5,596,742;the teachings of which are incorporated herein by reference in theirentirety.

BACKGROUND OF THE INVENTION

Field Programmable Gate Array (FPGA) based logic emulators are capableof emulating complex logic designs at clock speeds four to six orders ofmagnitude faster than even an accelerated software simulator. Onceconfigured, an FPGA-based emulator is a heterogeneous network of specialpurpose processors, each FPGA processor being specifically designed tocooperatively execute a partition of the overall simulated circuit. Asparallel processors, these emulators are characterized by theirinterconnection topology (network), target FPGA (processor), andsupporting software (compiler). The interconnection topology describesthe arrangement of FPGA devices and routing resources (i.e. fullcrossbar, two dimension mesh, etc). Important target FPGA propertiesinclude gate count (computational resources), pin count (communicationresources), and mapping efficiency. Supporting software is extensive,combining netlist translators, logic optimizers, technology mappers,global and FPGA-specific partitioners, placers, and routers.

FPGA-based logic emulation systems have been developed for designcomplexity ranging from several thousand to several million gates.Typically, the software for these system is considered the most complexcomponent. Emulation systems have been developed that interconnect FPGAsin a two-dimensional mesh and in a partial crossbar topology. Inaddition, a hierarchical approach to interconnection has been developed.Another approach uses a combination of nearest neighbor and crossbarinterconnections. Logic partitions are typically hardwired to FPGAsfollowing partition placement.

Statically routed networks can be used whenever communication can bepredetermined. Static refers to the fact that all data movement can bedetermined and optimized at compile-time. This mechanism has been usedin scheduling real-time communication in a multiprocessor environment.Other related uses of static routing include FPGA-based systolic arraysand in the very large simulation subsystem (VLSS), a massively parallelsimulation engine which uses time-division multiplexing to stagger logicevaluation.

In prior systems, circuit switching techniques are used to provideoutput signals from one chip to another chip. A given output pin of onechip can be directly connected to a given input pin of another chip orprovided during a dedicated time slot over a bus. The entire path of thesignal through the bus is dedicated, using assigned bus pins and timeslots to provide a direct connection during any time slot. A fullresource is thus used to transmit the signal from the output chip to theinput chip. An example of such a prior art system is discussed in VanDen Bout, AnyBoard: An FPGA-Based Reconfigurable System, IEEE Design andTest of Computers (Sept. 1992), pps. 21-30.

SUMMARY OF THE INVENTION

Existing FPGA-based logic emulators suffer from limited inter-chipcommunication bandwidth, resulting in low gate utilization (10 to 20percent). This resource imbalance increases the number of chips neededto emulate a particular logic design and thereby decreases emulationspeed, because signals must cross more chip boundaries, and increasessystem cost. Prior art emulators only use a fraction of potentialcommunication bandwidth because the prior art emulators dedicate eachFPGA pin (physical wire) to a single emulated signal (logical wire).These logical wires are not active simultaneously and are only switchedat emulation clock speeds.

A preferred embodiment of the invention presents a compilation techniqueto overcome device pin limitations using virtual interconnections. Thismethod can be applied to any topology and FPGA device, although somebenefit substantially more than others. Although a preferred embodimentof the invention focuses on logic emulation, the technique of virtualinterconnections is also applicable to other areas of reconfigurablelogic. Such reconfigurable logic systems (RLS) include, but are notlimited to, simulation acceleration systems, rapid prototyping systems,multiple FPGA systems and virtual computing systems.

Virtual interconnections overcome pin limitations by intelligentlymultiplexing each physical wire among multiple logical interconnectionsand pipelining these connections at the maximum clocking frequency ofthe FPGA. A virtual interconnections represents a connection from alogical output on one FPGA to a logical input on another FPGA. Virtualinterconnections not only increase usable bandwidth, but also relax theabsolute limits imposed on gate utilization. The resulting improvementin bandwidth reduces the need for global interconnect, allowingeffective use of low dimension inter-chip connections (such asnearest-neighbor). In a preferred embodiment, a "softwire" compilerutilizes static routing and relies on minimal hardware support. Virtualinterconnections can increase FPGA gate utilization beyond 80% without asignificant slowdown in emulation speed.

In a preferred embodiment of the invention, a FPGA logic emulationsystem comprises a plurality of FPGA modules. Each module is preferablya chip having a number of pins for communicating signals between chips.There are also interchip connections between the FPGA pins. In addition,a software or hardware compiler programs each FPGA chip to emulate apartition of an emulated circuit with interconnections betweenpartitions of the emulated circuit being provided through FPGA pins andinterchip connections. A partition of the emulated circuit has a numberof interconnections to other partitions that exceed the number of pinson the FPGA chip. The chip is programmed to communicate through virtualinterconnections in a time-multiplexed fashion through the pins. Theinter-chip communications include interconnections which extend throughthe intermediate FPGA chips.

The FPGA chips may comprise gates that are programmed to serve as amultiplexer for communicating through the virtual interconnections.Alternatively, the FPGA chips may comprise hardwire multiplexers thatare separate from the programmable gates. The interconnections may bepoint-to-point between pins, over a bus, or other interconnectionnetworks. The pins of the FPGA chips may be directly connected to pinsof other FPGA chips, where signals between the chips are routed throughintermediate FPGAs. The FPGA chips may also be programmed to operate inphases within an emulation clock cycle with interchip communicationsbeing performed within each phase.

The compiler may optimize partition selection and phase division of anemulated circuit based on interpartition dependencies.

Data may also be accessed from memory elements external to the FPGAsduring each phase by multiplexing the data on the virtualinterconnections.

In a preferred embodiment of the invention, the FPGA chips compriselogic cells as an array of gates, shift registers, and severalmultiplexers. The gates are programmable to emulate a logic circuit.Each shift register receives plural outputs from the program gate arrayand communicates the outputs through a single pin in a multiplexedfashion. Some fraction of the gates in an FPGA chip may be programmed toserve as shift registers and multiplexer for communicating throughvirtual connections.

In a preferred embodiment of the invention, a compiler configures a FPGAlogic emulation system using a partitioner for partitioning an emulatedlogic circuit and a programming mechanism for programming each FPGA toemulate a partition of an emulated circuit. The partitions are to beprogrammed into individual FPGA chips. The compiler produces virtualinterconnections between partitions of the emulated circuit thatcorrespond to one or more common pins with signals along the virtualinterconnections being time-multiplexed through the common pin.

The compiler may comprise a dependency analyzer and a divider fordividing an emulation clock into phases, the phase division being afunction of partition dependencies and memory assignments. During thephases, program logic functions are performed and signals aretransmitted between the FPGA chips. The compiler may also comprise arouter for programming the FPGA chips to route signals between chipsthrough intermediate chips. In particular, the routed signals arevirtual interconnections.

Results from compiling two complex designs, the 18K gate SPARCLEmicroprocessor and the 86K gate Alewife Cache Compiler (A-1000), showthat the use of virtual interconnections decreases FPGA chip count by afactor of 3 for SPARCLE and 10 for the A-1000, assuming a crossbarinterconnect. With virtual interconnections, a two dimensional torusinterconnect can be used for only a small increase in chip count (17percent for the A-1000 and 0 percent for SPARCLE). Without virtualinterconnections, the cost of replacing the full crossbar with a torusinterconnect is over 300 percent for SPARCLE, and virtually impossiblefor the A-1000. Emulation speeds are comparable with the no virtualinterconnections case, ranging from 2 to 8 MHZ for SPARCLE and 1 to 3MHZ for the A-1000. Neither design was bandwidth limited, but ratherconstrained by its critical path. With virtual interconnections, use ofa lower dimension network reduces emulation speed proportional to thenetwork diameter; a factor of 2 for SPARCLE and 6 for the A-1000 on atwo dimensional torus.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other features of the invention, including various noveldetails of construction and combinations of parts, will now be moreparticularly described with reference to the accompanying drawings andpointed out in the claims. It will be understood that the particularvirtual interconnection technique embodying the invention is shown byway of illustration only and not as a limitation of the invention. Theprinciples and features of this invention may be employed in varied andnumerous embodiments without departing from the scope of the invention.

FIG. 1 is a block diagram of a typical prior art logic emulation system.

FIG. 2 is a block diagram of a prior art hardwire interconnect systembetween Field Programmable Gate Arrays (FPGA) 10 of FIG. 1.

FIG. 3 is a block diagram of a virtual interconnection interconnectsystem between FPGAs 10 of FIG. 1.

FIG. 4 is a graphical representation of an emulation phase clockingscheme.

FIG. 5 is a flowchart of a preferred software compiler.

FIG. 6 is a block diagram of a preferred shift register or shift looparchitecture.

FIG. 7 is a block diagram of the intermediate hop, single bit, pipelinestage of FIG. 6.

FIG. 8 is a graph illustrating pin count as a function of FPGA partitionsize.

FIG. 9 is a graph illustrating a determination of optimal partitionsize.

FIG. 10 is a graph illustrating emulation speed vs. pin count for atorus and a crossbar configuration.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION

Although aspects of the invention are applicable to simulator systems,the invention is particularly advantageous in emulator systems where theemulator may be directly connected to peripheral circuitry. Pins forinterchip communications can be limited by multiplexing interchipsignals, yet input/output signals may be assigned dedicated pins forconnection to the peripheral circuitry.

FIG. 1 is a block diagram of a typical prior art logic emulation system5. The performance of the system 5 is achieved by partitioning a logicdesign, described by a netlist, across an interconnected array of FPGAs10. This array is connected to a host workstation 2 which is capable ofdownloading design configurations, and is directly wired into the targetsystem 8 for the logic design. Memory elements 6 may also be connectedto the array of FPGAs 10. The netlist partition on each FPGA(hereinafter FPGA partition), configured directly into logic circuitry,can then be executed at hardware speeds.

In existing architectures, shown in FIG. 2, both the logic configurationand the network connectivity remain fixed for the duration of theemulation. FIGS. 2 shows an example of six logical wires 11a-f, 19'a-fallocated to six physical interconnections 15a-f. Each emulated gate ismapped to one FPGA equivalent gate and each emulated signal is allocatedto one FPGA pin. Thus, for a partition to be feasible, the partitiongate and pin requirements must be no greater that the available FPGAresources. This constraint yields the following possible scenarios foreach FPGA partition:

1. Gate limited: no unused gates, but some unused pins.

2. Pin limited: no unused pins, but some unused gates.

3. Not limited: unused FPGA pins and gates.

4. Balanced: no unused pins or gates.

For mapping typical circuits onto available FPGA devices, partitions arepredominately pin limited; all available gates cannot be utilized due toa lack of pin resources to support them. Low utilization of gateresources increases both the number of FPGAs 10 needed for emulation andthe time required to emulate a particular design. Pin limits set a hardupper bound on the maximum usable gate count any FPGA gate size canprovide. This discrepancy will only get worse as technology scales;trends (and geometry) indicate that available gate counts are increasingfaster than available pin counts.

In a preferred embodiment of the invention, shown in FIG. 3, virtualinterconnections are used to overcome pin limitations in FPGA-basedlogic emulators. FIG. 3 shows an example of six logical wires 11a-fsharing a single physical wire 15x. The physical wire 15x is multiplexed13 between two pipelined shift loops 20a, 20b, which are discussed indetail below. Pipelining refers to signal streams in a particular phaseand multiplexing refers to signals across phases. A virtualinterconnection represents a connection between a logical output 11a onone FPGA 10 and a logical input 19'a on another FPGA 10'. Establishedvia a pipelined, statically routed communication network, these virtualinterconnections increase available off-chip communication bandwidth bymultiplexing 13 the use of FPGA pin resources (physical wires) 15 amongmultiple emulation signals (logical interconnections).

Virtual interconnections effectively relax pin limitations. Although lowpin counts may decrease emulation speed, there is not a hard pinconstraint that must be enforced. Emulation speed can be increased ifthere is a large enough reduction in system size. The gate overhead ofusing virtual interconnections is low, comprising gates that are notutilized in the purely hardwired implementation. Furthermore, theflexibility of virtual interconnections allows the emulationarchitecture to be balanced for each logic design application.

The logic emulator or the reconfigurable logic system may emulate alogic design that has a clock. The corresponding clock in the emulationor reconfigurable logic system is an emulation clock. One-to-oneallocation of emulation signals (logical wires) 11, 19 to FPGA pins(physical wires) 15 does not exploit available off-chip bandwidthbecause emulation clock frequencies are one or two orders of magnitudelower than the potential clocking frequency of the FPGA technology, andall logical interconnections 11, 19 are not active simultaneously.

By pipelining and multiplexing physical wires 15, virtualinterconnections are created to increase usable bandwidth. By clockingphysical wires 15 at the maximum frequency of the FPGA technology,several logical connections can share the same physical resource.

In a logic design, evaluation flows from system inputs to systemoutputs. In a synchronous design with no combinatorial loops, this flowcan be represented as a directed acyclic graph. Thus, throughintelligent dependency analysis of the underlying logic circuit, logicalvalues between FPGA partitions need to only be transmitted once.Furthermore, because circuit communication is inherently static,communication patterns repeat in a predictable fashion.

In a preferred embodiment of the invention, virtual interconnections aresupported with a "softwire" compiler. This compiler analyzes logicsignal dependencies and statically schedules and routes FPGAcommunication. These results are then used to construct (in the FPGAtechnology) a statically routed network. This hardware consists of asequencer and shift loops. The sequencer is a distributed finite statemachine. The sequencer establishes virtual connections between FPGAs bystrobing logical interconnections in a predetermined order into specialshift registers 21, the shift loops 20. The shift loops 20 serve asmultiplexers 13 and are described in detail below. Shift loops 20 arethen alternately connected to physical wires 15 according to thepredetermined schedule established by the sequences.

The use of virtual interconnections is limited to synchronous logic. Anyasynchronous signals must still be "hardwired" to dedicated FPGA pins.This limitation is imposed by the inability to statically determinedependencies in asynchronous loops. Furthermore, each combinational loop(such as a flip-flop) in a synchronous design is completely contained ina single FPGA partition. For simplicity and clarity of description, itis assumed that the emulated logic has a single global clock.

In a preferred embodiment of the invention, virtual interconnections areimplemented in the context of a complete emulation software system,independent of target FPGA device and interconnect topology. While thisembodiment focuses primarily on software, the ultimate goal of theinvention is a low-cost, reconfigurable emulation system.

In a preferred embodiment, the signals are routed through each FPGA byassigning a plurality of pins and time slots through intermediate FPGAs.This embodiment avoids the use of a crossbar. By routing the signalsthrough each FPGA, speed is increased because there are no long wiresconnecting the FPGAs to a crossbar.

In contrast to prior systems, a preferred embodiment of the inventiondoes not dedicate a signal path from source to destination. Inparticular, a preferred embodiment of the invention employs staticrouted packet switching where the wires over which a first signalpropagates can be reused by a second signal before the first signalreaches its destination. Thus only a single link in the signal path isdedicated during any system clock period. Indeed, the FPGAs can buffersignals such that higher priority signals can propagate over a wirebefore a competing lower priority signal.

FIG. 4 graphically represents an emulation phase clocking scheme. Theemulation clock period 52x is the clock period of the logic design beingemulated. This is broken into evaluation phases (54a, 54b, 54c) toaccommodate multiplexing. Multiple phases are required because thecombinational logic between flip-flops in the emulated design may besplit across multiple FPGA partitions and multiplexing of virtualinterconnections prevents direct pass of all signals through thepartitions. The phases permit a single pin to send different logicalsignals on every phase. Within a phase 54, evaluation is accomplishedwithin each partition, and the results are then communicated to otherFPGA partitions. Although three phases are illustrated per emulationperiod, it will be understood that more or less phases can be employed.

At the beginning of the phase 54, logical outputs of each FPGA partitionare determined by the logical inputs in input shift loops. At the end ofthe phase 54, outputs are then sent to other FPGA partitions withpipelined shift loops and intermediate hop stages. As illustrated inFIG. 4, these pipelines are clocked with a pipeline clock 56 at themaximum frequency of the FPGA. After all phases 54 within an emulationclock period 52x are complete, the emulation clock 52 is ticked to clockall flip-flops of the target circuit.

The input to the softwire compiler consists of a netlist 105 of thelogic design to be emulated, target FPGA device characteristics, andinterconnect topology. The compiler then produces a configurationbitstream that can be downloaded into the emulator. FIG. 5 is aflowchart of the compilation steps. Briefly, these steps includetranslation and mapping of the netlist to the target FPGA technology(step 110), partitioning the netlist (step 120), placing the partitionsinto interconnect topology (steps 130, 140), routing the inter-nodecommunication paths (steps 150, 160), and finally FPGA-specificautomated placement and routing (APR) (step 170).

The input netlist 105 to be emulated is usually generated with ahardware description language or schematic capture program. This netlist105 must be translated and mapped (step 110) to a library of FPGAmacros. It is important to perform this operation before partitioning sothat partition gate counts accurately reflect the characteristics of thetarget FPGAs. Logic optimization tools can also be used at this point tooptimize the netlist for the target architecture (considering the systemas one large FPGA).

After mapping (step 110) the netlist to the target architecture, thenetlist must be partitioned (step 120) into logic blocks that can fitinto the target FPGA. With only hardwires, each partition must have bothfewer gates and fewer pins than the target device. With virtualinterconnections, the total gate count (logic gates and virtualinterconnections overhead) must be no greater than the target FPGA gatecount. A preferred embodiment uses the Concept Silicon partitionermanufactured by InCA, Inc. This partitioner performs K-way partitioningwith min-cut and clustering techniques to minimize partition pin counts.

Because a combinatorial signal may pass through several FPGA partitionsduring an emulated clock cycle, all signals will not be ready toschedule at the same time. A preferred embodiment solves this problem byonly scheduling a partition output once all the inputs it depends uponare scheduled (step 130). An output depends on an input if a change inthat input can change the output. To determine input to outputdependencies, the logic netlist is analyzed, backtracing from partitionoutputs to determine which partition inputs they depend upon. Inbacktracing, it is assumed that all outputs depend on all inputs forgate library parts, and no outputs depend on any inputs for latch (orregister) library parts. If there are no combinatorial loops that crosspartition boundaries, this analysis produces a directed acyclic graph,the signal flow graph (SFC), to be used by the global router.

Following logic partitioning, individual FPGA partitions must be placedinto specific FPGAs (step 140). An ideal placement minimizes systemcommunication, thus requiring fewer virtual interconnection cycles totransfer information. A preferred embodiment first makes a randomplacement followed by cost-reduction swaps, and then further optimizewith simulated annealing.

During global routing (150), each logical wire is scheduled to a phase,and assigned a pipeline time slot (corresponding to one cycle of thepipeline clock in that phase on a physical wire). Before scheduling, thecriticality of each logical wire is determined (based on the signal flowgraph produced by dependency analysis). In each phase, the router firstdetermines the schedulable wires A wire is schedulable if all wires itdepends upon have been scheduled in previous phases. The router thanuses shortest path analysis with a cost function based on pinutilization to route as many schedulable signals as possible, routingthe most critical signals first. Any schedulable signals which cannot berouted are delayed to the next phase.

Once routing is completed, appropriately-sized shift loops andassociated logic are added to each partition to complete the internalFPGA hardware description (step 160). At this point, there is onenetlist for each FPGA. These netlists are then be processed with thevendor-specific FPGA place and route software (step 170) to produceconfiguration bitstreams (step 195).

Technically, there is no required hardware support for implementation ofvirtual interconnections (unless one considers re-designing an FPGAoptimized for virtual interconnecting). The necessary "hardware" iscompiled directly into configuration for the FPGA device. Thus, anyexisting FPGA-based logic emulation system can take advantage of virtualinterconnecting. Virtual interconnections can be used to store andretrieve data from memory elements external to the FPGAs by multiplexingthe data on the virtual interconnections during a phase. There are manypossible ways to implement the hardware support for virtualinterconnections. A preferred embodiment employs a simple and efficientimplementation. The additional logic to support virtual interconnectionscan be composed entirely of shift loops and a small amount of phasecontrol logic.

FIG. 6 is a block diagram of a preferred shift loop architecture. Ashift loop 20 is a circular, loadable shift register with enabled shiftin and shift out ports. Each shift register 21 is capable of performingone or more of the operations of load, store, shift, drive, or rotate.The Load operation strobes logical outputs into the shift loop. TheStore operation drives logical inputs from the shift loop. The Shiftoperation shifts data from a physical input into the shift loop. TheDrive operation drives a physical output with the last bit of the shiftloop. The Rotate operation rotates bits in the shift loop. In apreferred embodiment, all outputs loaded into a shift loop 20 must havethe same final destination FPGA. As described above, a logical outputcan be strobed once all corresponding depend inputs have been stored.The purpose of rotation is to preserve inputs which have reached theirfinal destination and to eliminate the need for empty gaps in thepipeline when shift loop lengths do not exactly match phase cyclecounts. In this way, a signal may be rotated from the shift loop outputback to the shift loop input to wait for an appropriate phase. Note thatin this implementation the store operation cannot be disabled.

Shift loops 20 can be re-scheduled to perform multiple outputoperations. However, because the internal latches being emulated dependon the logical inputs, inputs need to be stored until the tick of theemulation clock.

For networks where multiple hops are required (i.e. a mesh), one-bitshift registers 21 that always shift and sometimes drive are used forintermediate stages. FIG. 7 is a block diagram of the intermediate hoppipeline stage. These stages are chained together, one per FPGA hop, tobuild a pipeline connecting the output shift loop on the source FPGA 10with the input shift loop on the destination FPGA 10'.

The phase control logic is the basic run-time kernel in a preferredembodiment. This kernel is a sequencer that controls the phase enableand strobe (or load) lines, the pipeline clock, and the emulation clock.The phase enable lines are used to enable shift loop to FPGA pinconnections. The phase strobe lines strobe the shift loops on thecorrect phases. This logic is generated with a state machinespecifically optimized for a given phase specification.

EXPERIMENTAL RESULTS

The system compiler described above was implemented by developing adependency analyzer, global placer, global router, and using the InCApartitioner. Except for the partitioner, which can take hours tooptimize a complex design, running times on a SPARC 2 workstation wereusually 1 to 15 minutes for each stage.

To evaluate the costs and benefits of virtual interconnections, twocomplex designs were compiled, SPARCLE and the A-1000. SPARCLE is an 18Kgate SPARC microprocessor enhanced with multiprocessing features. TheAlewife compiler and memory management unit (A-1000) is an 86K gatecache compiler for the Alewife Multiprocessor, a distributed sharedmemory machine being designed at the Massachusetts Institute ofTechnology. For target FPGAs, the Xilinx 3000 and 4000 series (includingthe new 4000H series) and the Concurrent Logic Cli6000 series wereconsidered. This analysis does not include the final FPGA-specific APRstage; a 50 percent APR mapping efficiency for both architectures isassumed.

In the following analysis, the FPGA gate costs of virtualinterconnections based on the Concurrent Logic CLI6000 series FPGA wereestimated. The phase control logic was assumed to be 300 gates (aftermapping). Virtual interconnections overhead can be measured in terms ofshift loops. In the Cli6000, a bit stage shift register takes 1 of 3136cells in the 5K gate part (C_(s=) 3 mapped gates). Thus, total requiredshift register bits for a partition is then equal to the number ofinputs. When routing in a mesh or torus, intermediate hops cost 1 bitper hop. The gate overhead is then C_(s) ×S, where C_(s) is the cost ofa shift register bit, and S is the number of bits. S is determined bythe number of logical inputs, V_(i), and M_(p), the number of times aphysical wire p is multiplexed (this takes into account the shift looptristate driver and the intermediate hop bits). Gate overhead is thenapproximately:

    Gate.sub.vw =C.sub.s ×(V.sub.i +Σ.sub.p M.sub.p),

Storage of logical outputs is not counted because logical outputs can beoverlapped with logical inputs.

Before compiling the two test designs, their communication requirementswere compared to the available FPGA technologies. For this comparison,each design was partitioned for various gate counts and the pinrequirements were measured. FIG. 8 shows the resulting curves, plottedon a log-log scale. Note that the partition gate count is scaled torepresent mapping inefficiency.

Both design curves and the technology curves fit Rent's Rule, a rule ofthumb used for estimating communication requirement in random logic.Rent's Rule can be stated as:

    pins.sub.2 /pins.sub.1 =(gates.sub.2 /gate.sub.1).sup.b,

where pins₂, gates₂ refer to a partition, and pins₁, gates₁ refer to asub-partition, and b is constant between 0.4 and 0.7. Table 1 shows theresulting constants. For the technology curve, a constant of 0.5 roughlycorresponds to the area versus perimeter for the FPGA die. The lower theconstant, the more locality there is within the circuit. Thus, theA-1000 has more locality than SPARCLE, although it has more totalcommunication requirements. As FIG. 8 illustrates, both SPARCLE and theA-1000 will be pin-limited for any choice of FPGA size. In hardwireddesigns with pin-limited partition sizes, usable gate count isdetermined solely by available pin resources. For example, a 5000 gateFPGA with 100 pins can only utilize 1000 SPARCLE gates or 250 A-1000gates.

                  TABLE 1                                                         ______________________________________                                        Rent's Rule Parameter (slope of log-log curve)                                FPGA Technology   SPARCLE  A-1000                                             ______________________________________                                        0.50              0.06     0.44                                               ______________________________________                                    

Table 1: Rent's Rule Parameter (slope of log-log curve)

Next, both designs were compiled for a two dimensional torus and a fullcrossbar interconnect of 5000 gate, 100 pin FPGAs, 50 percent mappingefficiency. Table 2 shows the results for both hard wires and virtualinterconnections. Compiling the A-1000 to a torus, hardwires only, wasnot practical with the partitioning software. The gate utilizationsobtained for the hardwired cases agree with

                  TABLE 2                                                         ______________________________________                                        Number of 5K Gates, 100 Pin FPG                                               As Required for Logic Emulation                                               Hardwires Only      Virtual Interconnections Only                                               Full      2-D      Full                                     Design  2-D Torus Crossbar  Torus    Crossbar                                 ______________________________________                                        Sparcle >100      31         9        9                                       (18K gates)                                                                           (<7%)     (23%)     (80%)    (80%)                                    A-1000  Not       >400      49       42                                       (86K gates)                                                                           Practical (<10%)    (71%)    (83%)                                    ______________________________________                                         Number of FPGAs (Average Usable Gate Utilization)                        

Table 2: Number of 5K Gates, 100 Pin FPG As Required for Logic Emulation

reports in the literature on designs of similar complexity. Tounderstand the tradeoffs involved, the hardwires pin/gate constraint andthe virtual interconnections pin/gate tradeoff curve were plottedagainst the partition curves for the two designs (FIG. 9). Theintersection of the partition curves and the wire curves gives theoptimal partition and sizes. This graph shows how virtualinterconnections add the flexibility of trading gate resources for pinresources.

Emulation clock cycle time T_(E) is determined by:

1. Communication delay per hop, t_(c) ;

2. Length of longest path in dependency graph L;

3. Total FPGA gate delay along longest path T_(Li) ;

4. Sum of pipeline cycles across all phases, n;

5. Network diameter, D (D=1 for crossbar); and

6. Average network distance, k_(d) (k_(d=) 1 for crossbar).

The total number of phases and pipeline cycles in each phase aredirectly related to physical wire contention and the combinatorial paththat passes through the largest number of partitions. If the emulationis latency dominated, then the optimal number of phases is L, and thepipeline cycles per phase should be no greater than D, giving:

    n=L×D.

If the emulation is bandwidth dominated, then the total pipeline cycles(summed over all phases) is at least:

    n=MAX.sub.p  (Vi.sub.p /Pi.sub.p)!

where Vi_(p) and Pi_(p) are the number of virtual and physical wires forFPGA partition p. If there are hot spots in the network (not possiblewith a crossbar), the bandwidth dominated delay will be higher.Emulation speeds for SPARCLE and the A-1000 were both latency dominated.

Based on CLi6000 specifications, it was assumed that T_(L=) 250 ns andt_(c=) 20 ns (based on a 50 MHZ clock). A computation-only delaycomponent, and a communication-only delay component were considered.This dichotomy is used to give a lower and upper bound on emulationspeed.

The computation-only delay component is given by:

    T.sub.P =T.sub.L +t.sub.c ×n,

where n=0 for the hardwired case.

The communication-only delay component is given by:

    T.sub.c =t.sub.c ×n.

Table 3 shows the resulting emulation speeds for virtual and hardwiresfor the crossbar topology. The emulation clock range given is based onthe sum and minimum of the two components (lower and upper bounds). Whenthe use of virtual interconnections allows a design to be partitionedacross fewer FPGAs, L is decreased, decreasing T_(c). However, thepipeline stages will increase T_(p) by t_(c) per pipeline cycle.

                  TABLE 3                                                         ______________________________________                                        Emulation Clock Speed Comparison                                                                       Virtual                                                                Hardwire                                                                             Interconnection                                                        Only   Only                                                 ______________________________________                                        SPARCLE                                                                              Longest Path     9 hops   6 hops                                              Computation Only Delay                                                                         250 ns   370 ns                                              Communication Only Delay                                                                       180 ns   120 ns                                              Emulation Clock Range                                                                          2.3-5.6  2.0-8.3                                                              MHz      MHz                                          A-1000 Longest Path     27 hops  17 hops                                             Computation Only Delay                                                                         250 ns   590 ns                                              Communication Only Delay                                                                       540 ns   340 ns                                              Emulation Clock Range                                                                          1.3-4.0  1.1-2.9                                                              MHz      MHz                                          ______________________________________                                    

Table 3: Emulation Clock Speed Comparison

In Table 3, the virtual interconnections emulation clock was determinedsolely by the length of the longest path; the communication was limitedby latency, not bandwidth. To determine what happens when the designbecomes bandwidth limited, the pin count was varied and the resultingemulation clock (based on T_(c)) was record ed f or both a crossbar andtorus topology. FIG. 10 shows the results for the A-1000. The knee ofthe curve is where the latency switches from bandwidth dominated tolatency dominated. The torus is slower because it has a larger diameter,D. However, the torus moves out of the latency dominated region soonerbecause it exploits locality; several short wires can be routed duringthe time of a single long wire. Note that this analysis assumes thecrossbar can be clocked as fast as the torus; the increase in emulationspeed obtained with the crossbar is lower if t_(c) is adjustedaccordingly.

With virtual interconnections, neither designs was bandwidth limited,but rather limited by its respective critical paths. As shown in FIG.10, the A-1000 needs only about 20 pins per FPGA to run at the maximumemulation frequency. While this allows the use of lower pin count (andthus cheaper) FPGAs, another option is to trade this surplus bandwidthfor speed. This tradeoff is accomplished by hardwiring logicalinterconnections at both ends of the critical paths. Critical wires canbe hardwired until there is no more surplus bandwidth, thus fullyutilizing both gate and pin resources. For designs on the 100 pin FPGAs,hardwiring reduces the longest critical path from 6 to 3 for SPARCLE andfrom 17 to 15 for the A-1000.

Virtual interconnections allow maximum utilization of FPGA gateresources at emulation speeds competitive with existing hardwiredtechniques. This technique is independent of topology. Virtualinterconnections allow the use of less complex topologies, such as atorus instead of a crossbar, in cases where such a topology was notpractical otherwise.

Using timing and/or locality sensitive partitioning with virtualinterconnections has potential for reducing the required number ofrouting sub-cycles. Communication bandwidth can be further increasedwith pipeline compaction, a technique for overlapping the start and endof long virtual paths with shorter paths traveling in the samedirection. A more robust implementation of virtual interconnectionsreplaces the global barrier imposed by routing phases with a finergranularity of communication scheduling, possible overlappingcomputation and communication as well.

Using the information gained from dependency analysis, one can nowpredict which portions of the design are active during which parts ofthe emulation clock cycle. If the FPGA device supports fast partialreconfiguration, this information can be used to implement virtual logicvia invocation of hardware subroutines. An even more ambitious directionis event-driven emulation--only send signals which change, only activate(configure) logic when it is needed.

Equivalents

Those skilled in the art will know, or be able to ascertain using nomore than routine experimentation, many equivalents to the specificembodiments of the invention described herein.

These and all other equivalents are intended to be encompassed by thefollowing claims.

The invention claimed is:
 1. A reconfigurable electronic systemcomprising:a plurality of reprogrammable logic modules, each logicmodule having a plurality of pins for communicating signals external tothe logic module and a plurality of logic elements for implementinglogic in hardware; inter-module connections between pins of differentlogic modules; and a configurer for automatically configuring each logicmodule to define a partition of a specified target circuit withcommunications between the partitions of the target circuit beingprovided through pins and inter-module connections, a partition of theconfigured system having a number of inter-module communications toother partitions that exceeds the number of pins on the logic module ofthe partition and the logic module of the partition being configured tocommunicate through virtual interconnections in a time-multiplexedfashion through at least one pin of the logic module of the partition,the configurer determining a static virtual interconnection whichincludes a communication path extending through an intermediatereprogrammable logic module.
 2. A system as claimed in claim 1 whereineach logic module comprises an array of interconnected programmablelogic cells.
 3. A system as claimed in claim 1 wherein the configurerconfigures a logic module to form a multiplexer for communicatingthrough virtual interconnections.
 4. A system as claimed in claim 1wherein the logic modules are configured to operate in phases within atarget clock period with the inter-module communications being performedwithin each phase.
 5. A system as claimed in claim 4 wherein theconfigurer optimizes logic module selection and phase division of thetarget circuit based on inter-module dependencies.
 6. A system asclaimed in claim 4 wherein the target clock period is a clock period ofthe target circuit which dictates the maximum rate at which signal linesof the target circuit change value and wherein each target clock periodcomprises a plurality of system clock periods which dictate the maximumrate at which signals in the electronic system change value.
 7. A systemas claimed in claim 1, including asynchronous logic hardwired todedicated pins of the logic modules.
 8. A system as claimed in claim 1wherein data is accessed from memory elements external to the logicmodules.
 9. A system as claimed in claim 8 wherein there are timemultiplexed interconnections between the logic modules and the memoryelements.
 10. A system as claimed in claim 1 wherein the logic modulesare Field Programmable Gate Arrays (FPGAs).
 11. A system as claimed inclaim 1 wherein each logic module is a single chip.
 12. A system asclaimed in claim 1 wherein the digital system is an emulation system foremulating the target system.
 13. A system as claimed in claim 1 whereinlogic modules are configured to include pins dedicated to individualsignals.
 14. A logic system as claimed in claim 1 wherein the configurercomprises a partitioner for partitioning the target logic circuit, eachpartition being configured into a respective logic module.
 15. A systemas claimed in claim 14 further comprising a dependency analyzer and adivider for dividing a target clock period into phases during whichprogram logic functions are performed and signals are transmittedbetween the logic modules, the phase division being a function ofpartition dependencies and memory assignments.
 16. A system as claimedin claim 14 further comprising a router for configuring the logicmodules to route signals between logic modules through intermediatereprogrammable logic modules.
 17. A reconfigurable electronic systemcomprising:a plurality of reprogrammable logic modules, each logicmodule having an array of gates reconfigurable to define a hardwarelogic circuit and a plurality of pins for communicating signals externalto the logic module; inter-module connections between pins of differentlogic modules; and a configurer for automatically configuring the arrayof gates, each logic module to define a partition of a specified targetcircuit with communications between the partitions of the target circuitbeing provided through pins and inter-module connections a partition ofthe configured system having a number of inter-module communications toother partitions that exceeds the number of pins on the logic module ofthe partition and gates of the logic module of the partition beingconfigured as a multiplexer for receiving a plurality of outputs fromthe configured array of gates and for statically communicating thereceived outputs through a single pin in a multiplexed, pipelinedfashion.
 18. A system as claimed in claim 17 further comprising at leastone shift register coupled between the multiplexer and the configuredgate array.
 19. A system as claimed in claim 18 wherein the shiftregisters are configured from gates in the logic module.
 20. Areconfigurable electronic system comprising:a plurality ofreprogrammable logic modules, each logic module having a plurality ofpins for communicating signals external to the logic module and aplurality of logic elements for implementing logic in hardware;inter-module connections between pins of different logic modules; and aconfigurer for automatically configuring each logic module to define apartition of a specified target circuit with communications between thepartitions of the target circuit being provided through pins andinter-module connections, a partition of the configured system having anumber of inter-module communications to other partitions that exceedsthe number of pins on the logic module of the partition and the logicmodule of the partition being configured to communicate through staticvirtual interconnections in a time-multiplexed fashion through at leastone pin of the logic module of the partition, the electronic systemincluding dedicated pins for providing a predetermined signal.
 21. Amethod of automatically compiling a reconfigurable digital system,comprising the steps of:partitioning a target circuit into a pluralityof partitions, each partition to be configured into a reprogrammablelogic module having a plurality of pins and a plurality of logicelements; configuring the logic modules to create partitions of thetarget circuit; determining static virtual interconnections betweenpartitions corresponding to at least one common pin with signals alongthe virtual interconnections being time-multiplexed through the at leastone common pin; and configuring the logic modules to route signalsbetween logic modules through intermediate logic modules.
 22. A methodas claimed in claim 21 further comprising the step of dividing a firstclock period which dictates the maximum rate at which signal lineswithin the target circuit change value into phases during which programlogic functions are performed and signals are transmitted between logicmodules.
 23. A reconfigurable logic module comprising:an array of gatesconfigurable to define a logic circuit; and a virtual interconnectioncomprising a plurality of gates reconfigured as a multiplexer, themultiplexer receiving a plurality of outputs from the configured gatearray and communicating each of the received outputs through a singlepin in a multiplexed, pipelined fashion.
 24. A logic module as claimedin claim 23 further comprising at least one register coupled between themultiplexer and the configured gate array.
 25. A logic module as claimedin claim 24 wherein the at least one register is configured from gatesin the logic module.
 26. A logic module as claimed in claim 24 whereinthe at least one register includes a shift register.
 27. A logic moduleas claimed in claim 23 wherein the logic module is a Field ProgrammableGate Array (FPGA).
 28. A reconfigurable electronic system comprisingaplurality of reprogrammable logic modules, each logic module having aplurality of pins for communicating signals between logic modules; and aconfigurer to configure each logic module to define a partition of aspecified target circuit, a partition of the configured target circuithaving a number of interconnections to other partitions that exceeds thenumber of pins on the logic module and the logic module being configuredto communicate through virtual interconnections through at least onepin.
 29. A system as claimed in claim 28 wherein the configurerconfigures a logic module to form a multiplexer for communicatingthrough virtual interconnections.
 30. A system as claimed in claim 28wherein pins of logic modules are directly connected to pins of otherlogic modules and routing of signals between the logic modules isthrough intermediate logic modules.
 31. A system as claimed in claim 28wherein the logic modules comprise hardwired multiplexers.
 32. A systemas claimed in claim 28 wherein the configurer optimizes logic moduleselection and phase division of the target circuit based oninterpartition dependencies.
 33. A system as claimed in claim 28 whereindata is accessed from memory elements external to the logic modules. 34.A system as claimed in claim 28 wherein the logic modules are FieldProgrammable Gate Arrays (FPGAs).
 35. A system as claimed in claim 28wherein the system is an emulation system for emulating the targetcircuit.
 36. A system as claimed in claim 28 further comprisinginter-module connections between pins of different logic modules withinterconnections between the partitions of the target circuit beingprovided through pins and inter-module connections.
 37. A system asclaimed in claim 36 wherein the inter-module communications includeinterconnections which extend through intermediate reconfigurable logicmodules.
 38. A system as claimed in claim 28 wherein the logic modulesare configured to operate in phases within a target clock period withinter-module communications being performed within each phase.
 39. Asystem as claimed in claim 28 wherein the logic module is configured tocommunicate through virtual interconnections in a time-multiplexedfashion.